Accessing geospatial data the easy way (Python)

The access to geospatial data has changed significantly over the past decade. Data has traditionally been accessed by downloading several files to a local computer, then analyzing them with software or programming languages. It has always been difficult to access analysis-ready datasets due to the diversity of data formats (NetCDF, Grib2, Geotiff, Shapefile, etc.) and the variety of access protocols from different providers (Opendap, HTTPS, SFTP, WPS, API Rest, Datamarts, etc.). Beyond that, with the ever-increasing size of geospatial datasets, most modern datasets cannot even fit on a local computer, limiting science’s progress

The datasets presented here are large-scale analysis-ready cloud optimized (ARCO). In order to implement an entry point for a list of datasets, we have followed the methodology developed by the Pangeo community, which combines multiple technologies: - Data Lake (or S3, Azure Data Lake Storage, GCS, etc.) : distributed file-object storage - Zarr (or alternatively TileDB, COGs) : chunked N-dimensionnal array formats - Dask (or alternatively Spark, Ray, Distributed) : distributed computing and lazy loading - Intake Catalogs (or alternatively STAC) : a general interface for loading different data formats, mostly but not limited to spatiotemporal assets

For more information, please refer to the pangeo’s website

It is important to keep in mind that the majority of the datasets in the catalogue have language-agnostic formats, making them accessible through a variety of programming languages (including Python, Julia, Javascript, C, etc.) that implement the specifications for these formats (such as Zarr, netcdfs (kerchunk), geojson, etc.).

[1]:
from distributed import Client
import intake
import hvplot.xarray
import hvplot.pandas
from dask.distributed import PipInstall
import xoak
import xarray as xr
import numpy as np
import pandas as pd

Dask client

We use a Dask client to ensure all following code compatible with the framework run in parallel

[2]:
client = Client()
client
[2]:

Client

Client-00a11fa7-6697-11ed-8a5d-000d3a5af985

Connection method: Cluster object Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

Intake catalogs

Intake is a lightweight package for finding, investigating, loading and disseminating data. A cataloging system is used to organize a collection of datasets and data loaders (drivers) are parameterized such that datasets are opened in the desired format for the end user. In the python context, multi-dimensional xarrays could be opened with xarray’s drivers while polygons (shapefiles, geojson) could be opened with geopandas.

Here is the URL where you can access the catalog:

[3]:
catalog_url = 'https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml'
cat=intake.open_catalog(catalog_url)
cat
main:
  args:
    path: https://raw.githubusercontent.com/hydrocloudservices/catalogs/main/catalogs/main.yaml
  description: Master Data Catalog
  driver: intake.catalog.local.YAMLFileCatalog
  metadata: {}

In order to arrange the collection of datasets, the catalogue itself makes references to various sub-catalogs:

[4]:
[cat[field]
 for field in list(cat._entries.keys())]
[4]:
[<Intake catalog: hydrology>,
 <Intake catalog: atmosphere>,
 <Intake catalog: geography>,
 <Intake catalog: climate_change>]

Even though our catalogue is constantly expanding, some datasets are already available. The next sections contain several examples of queries as well as analyses of various ones.

The current (flattened) catalogue is described in the table below. A dataset should be used after consulting the status field. If a dataset has a “dev” flag, it signifies that we are actively working on it and do not recommend using it. It is production-ready if it has a “prod” flag. The “prod” label signifies that the dataset has undergone quality review and testing, however users should always double-check on their own because errors are still possible.

[5]:
pd.set_option('display.max_colwidth', None)

pd.DataFrame([[field ,
               dataset,
               cat[field][dataset].describe()['description'],
               cat[field][dataset].describe()['metadata']['status'][0]]
              for field in list(cat._entries.keys())
              for dataset in cat[field]._entries.keys()],
            columns=['field', 'dataset_name', 'description', 'status']) \
.sort_values('field')
[5]:
field dataset_name description status
1 atmosphere era5_reanalysis_single_levels ERA5 hourly estimates of variables on single levels chunked for time series analysis prod
2 atmosphere era5_reanalysis_single_levels_spatial ERA5 hourly estimates of variables on single levels chunked for spatial analysis dev
3 atmosphere era5_land_reanalysis_spatial ERA5-Land hourly estimates on single level chunked for spatial analysis dev
4 atmosphere era5_reanalysis_pressure_levels ERA5 hourly estimates of variables on pressure levels prod
5 atmosphere daymet_daily_na Daymet Data Version 4.0 prod
6 atmosphere ghcnd_world Global Historical Climatology Network daily (GHCNd) dev
7 atmosphere scdna SCDNA a serially complete precipitation and temperature dataset for North America from 1979 to 2018 prod
8 atmosphere 20_century_reanalysis_single_levels NOAA-CIRES-DOE Twentieth Century Reanalysis (20CR) on single levels spanning 1836 to 2015 chunked for time series analysis prod
9 atmosphere 20_century_reanalysis_single_levels_large_area NOAA-CIRES-DOE Twentieth Century Reanalysis (20CR) on single levels spanning 1836 to 2015 chunked for spatial analysis prod
10 atmosphere 20_century_reanalysis_pressure_levels NOAA-CIRES-DOE Twentieth Century Reanalysis (20CR) on pressure levels spanning 1836 to 2015 chunked for time series analysis prod
11 atmosphere 20_century_reanalysis_pressure_levels_large_area NOAA-CIRES-DOE Twentieth Century Reanalysis (20CR) on pressure levels spanning 1836 to 2015 chunked for spatial analysis prod
12 atmosphere terraclimate TerraClimate is a dataset of monthly climate and climatic water balance for global terrestrial surfaces from 1958-2019 prod
14 climate_change rcp45_day_NAM_22i_raw_zarr NA-Cordex (limited to rcp45 for now... more to come!) dev
13 geography melcc_polygons MELCC basin delimitation dev
0 hydrology melcc CEHQ daily flow and water levels dev

1) Atmosphere datasets

a) ERA5 single levels

ERA5 is the fifth generation ECMWF atmospheric reanalysis of the global climate covering the period from January 1950 to present. ERA5 is produced by the Copernicus Climate Change Service (C3S) at ECMWF.

Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product.

Property

Values

Temporal extent:

01/01/1979 – 12/31/2020

Spatial extent:

World : [-180, 180, -90, 90]

Chunks (timeseries’s version):

{‘time’: 14880, ‘longitude’: 15, ‘latitude’: 15}

Chunks (spatial’s version):

{‘time’: 24, ‘longitude’: 1440, ‘latitude’: 721}

Spatial resolution:

0.25 degrees

Spatial reference:

WGS84 (EPSG:4326)

Temporal resolution:

1 hour

Update frequency:

In 2023, we will update it weekly

Data access

[6]:
ds=cat.atmosphere.era5_reanalysis_single_levels.to_dask()
ds
[6]:
<xarray.Dataset>
Dimensions:    (latitude: 721, longitude: 1440, time: 368184)
Coordinates:
  * latitude   (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
  * longitude  (longitude) float32 -180.0 -179.8 -179.5 ... 179.2 179.5 179.8
  * time       (time) datetime64[ns] 1979-01-01 ... 2020-12-31T23:00:00
Data variables:
    t2m        (time, latitude, longitude) float32 dask.array<chunksize=(14880, 15, 15), meta=np.ndarray>
    tp         (time, latitude, longitude) float32 dask.array<chunksize=(14880, 15, 15), meta=np.ndarray>
Attributes:
    institution:  ECMWF
    source:       Reanalysis
    title:        ERA5 forecasts

Working with the data

We can quickly choose data subsets in both space and time using xarray. Here, we choose July 19–20, 1996, a period when Quebec saw historically extreme precipitation (Canada). The graphic package hvplot can then be used to track the storm throughout the event.

[7]:
%%time

da = ds.tp \
.sel(time=slice('1996-07-19','1996-07-20'),
     longitude=slice(-90,-50),
     latitude=slice(60,35))

da \
.where(da>=0.001) \
.load() \
.hvplot(groupby='time',
        widget_type='scrubber',
        widget_location='bottom',
        cmap='gist_ncar',
        tiles='ESRI',
        geo=True,
        clim=(0.001, 0.005),
        width=750,
        height=400)
CPU times: user 7.32 s, sys: 563 ms, total: 7.88 s
Wall time: 42.5 s
[7]:

Because this zarr’s version of ERA5 is optimised for time series analysis, all historical data can be quickly extracted on a relatively small spatial extent (a point or a polygon for instance) as opposed to working with a collection of netcdf files which is typically extremely compute-intensive for large datasets due to the netcdfs being chunked in the time dimension.

[8]:
%%time
da = (1000*ds.tp) \
.sel(longitude=-75,
     latitude=45,
     method='nearest')

da.hvplot(grid=True, width=800, height=500, color='blue')
CPU times: user 687 ms, sys: 105 ms, total: 792 ms
Wall time: 10.6 s
[8]:
[9]:
%%time
da = (1000*ds.tp) \
.sel(longitude=-75,
     latitude=45,
     method='nearest') \
.resample(time='1Y') \
.sum()

da.hvplot.line(grid=True, width=800, height=500, color='blue')* \
da.hvplot.scatter(marker='o').opts(color='black', size=14)
CPU times: user 2.1 s, sys: 172 ms, total: 2.27 s
Wall time: 22 s
[9]:

b) ERA5 pressure levels

ERA5 is the fifth generation ECMWF atmospheric reanalysis of the global climate covering the period from January 1950 to present. ERA5 is produced by the Copernicus Climate Change Service (C3S) at ECMWF.

Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product.

Property

Values

Temporal extent:

01/01/1979 – 12/31/2019

Spatial extent:

Atlantic Northeast : [-96, -52, 40, 63]

Chunks:

{‘time’: 8760, ‘longitude’: 25, ‘latitude’: 25, ‘level’: 1}

Spatial resolution:

0.25 degrees

Spatial reference:

WGS84 (EPSG:4326)

Temporal resolution:

1 hour

Update frequency:

None

[10]:
ds=cat.atmosphere.era5_reanalysis_pressure_levels.to_dask()
ds
[10]:
<xarray.Dataset>
Dimensions:    (latitude: 93, level: 6, longitude: 177, time: 359400)
Coordinates:
  * latitude   (latitude) float32 63.0 62.75 62.5 62.25 ... 40.5 40.25 40.0
  * level      (level) int32 300 400 500 700 850 1000
  * longitude  (longitude) float32 -96.0 -95.75 -95.5 ... -52.5 -52.25 -52.0
  * time       (time) datetime64[ns] 1979-01-01 ... 2019-12-31T23:00:00
Data variables:
    r          (time, level, latitude, longitude) float32 dask.array<chunksize=(8760, 1, 25, 25), meta=np.ndarray>
    t          (time, level, latitude, longitude) float32 dask.array<chunksize=(8760, 1, 25, 25), meta=np.ndarray>
    u          (time, level, latitude, longitude) float32 dask.array<chunksize=(8760, 1, 25, 25), meta=np.ndarray>
    v          (time, level, latitude, longitude) float32 dask.array<chunksize=(8760, 1, 25, 25), meta=np.ndarray>
    z          (time, level, latitude, longitude) float32 dask.array<chunksize=(8760, 1, 25, 25), meta=np.ndarray>
Attributes:
    Conventions:  CF-1.6
    history:      2019-12-18 03:49:32 GMT by grib_to_netcdf-2.14.0: /opt/ecmw...

Working with the data

[11]:
%%time
ds.z \
.sel(longitude=-75, latitude=45, level=[500, 700, 850, 1000]).hvplot(grid=True, by='level')
CPU times: user 2.73 s, sys: 459 ms, total: 3.19 s
Wall time: 39.6 s
[11]:

c) ERA5-Land

ERA5-Land is the fifth generation ECMWF atmospheric reanalysis of the global climate covering the period from January 1950 to present. ERA5-Land is produced by the Copernicus Climate Change Service (C3S) at ECMWF.

Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product.

Property

Values

Temporal extent:

01/01/1950 – present

Spatial extent:

North America : [-167, -50, 15, 85]

Chunks (timeseries’s version):

{‘time’: 8760, ‘longitude’: 7, ‘latitude’: 7}

Chunks (spatial’s version):

{‘time’: 24, ‘longitude’: 1171, ‘latitude’: 701}

Spatial resolution:

0.1 degrees

Spatial reference:

WGS84 (EPSG:4326)

Temporal resolution:

1 hour

Update frequency:

In 2023, we will update it monthly

Available in December 2022. Please refer to previous ERA5 examples once the dataset is added to the catalog.

d) Daymet

The Daymet dataset contains daily minimum temperature, maximum temperature, precipitation, shortwave radiation, vapor pressure, snow water equivalent, and day length at 1km resolution for North America. Annual and monthly summaries are also available. The dataset covers the period from January 1, 1980 to December 31, 2020.

Daymet is accessible on Azure in Zarr format; this notebook shows how to access the data using the Planetary Computer’s resources so that it can be read into a xarray dataset.

Property

Values

Temporal extent:

01/01/1980 – 12/31/2020

Spatial extent:

North America

Chunks (timeseries’s version):

{‘time’: 365, ‘longitude’: 584, ‘latitude’: 284}

Spatial resolution:

1 km

Spatial reference:

Custom (‘+ellps=WGS84 +proj=lcc +lon_0=-100 +lat_0=42.5 +x_0=0.0 +y_0=0.0 +lat_1=25 +lat_2=60 +no_defs’)

Temporal resolution:

1 day

Update frequency:

None

Data access

[12]:
ds=cat.atmosphere.daymet_daily_na.to_dask()
ds
[12]:
<xarray.Dataset>
Dimensions:                  (time: 14965, y: 8075, x: 7814, nv: 2)
Coordinates:
    lat                      (y, x) float32 dask.array<chunksize=(284, 584), meta=np.ndarray>
    lon                      (y, x) float32 dask.array<chunksize=(284, 584), meta=np.ndarray>
  * time                     (time) datetime64[ns] 1980-01-01T12:00:00 ... 20...
  * x                        (x) float32 -4.56e+06 -4.559e+06 ... 3.253e+06
  * y                        (y) float32 4.984e+06 4.983e+06 ... -3.09e+06
Dimensions without coordinates: nv
Data variables:
    dayl                     (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    lambert_conformal_conic  int16 ...
    prcp                     (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    srad                     (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    swe                      (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    time_bnds                (time, nv) datetime64[ns] dask.array<chunksize=(365, 2), meta=np.ndarray>
    tmax                     (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    tmin                     (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    vp                       (time, y, x) float32 dask.array<chunksize=(365, 284, 584), meta=np.ndarray>
    yearday                  (time) int16 dask.array<chunksize=(365,), meta=np.ndarray>
Attributes:
    Conventions:       CF-1.6
    Version_data:      Daymet Data Version 4.0
    Version_software:  Daymet Software Version 4.0
    citation:          Please see http://daymet.ornl.gov/ for current Daymet ...
    references:        Please see http://daymet.ornl.gov/ for current informa...
    source:            Daymet Software Version 4.0
    start_year:        1980

Working with the data

Because Daymet has a custom projection, we use xoak library to query some data. It is also possible to regrid or reproject the data to facilitate analysis.

[13]:
%%time
ds = ds.sel(time=slice('2000-01-01','2001-01-01'))
points = xr.Dataset(
    {
        "lat": 45,
        "lon": -75,
    }
)

da_tmax = ds.tmax
da_tmax.xoak.set_index(["lat", "lon"], "sklearn_geo_balltree")

da_tmin = ds.tmin
da_tmin.xoak.set_index(["lat", "lon"], "sklearn_geo_balltree")

prcp = ds.prcp
prcp.xoak.set_index(["lat", "lon"], "sklearn_geo_balltree")

swe = ds.swe
swe.xoak.set_index(["lat", "lon"], "sklearn_geo_balltree")

(da_tmax.xoak.sel(lat=points.lat,
                  lon=points.lon).hvplot(grid=True,
                                         value_label='daily temperature (degrees C)')* \
da_tmin.xoak.sel(lat=points.lat,
                 lon=points.lon).hvplot(grid=True) + \
prcp.xoak.sel(lat=points.lat,
              lon=points.lon).hvplot(grid=True) + \
swe.xoak.sel(lat=points.lat,
             lon=points.lon).hvplot(grid=True)
).cols(1)

2022-11-17 16:51:09,551 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-924b33fd-4110-4598-934f-03c9c873c05d
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.667233, -145.48022 ],
       [  21.67183 , -145.47212 ],
       [  21.676428, -145.464   ],
       ...,
       [  26.926159,  -66.034355],
       [  26.922552,  -66.025024],
       [  26.918943,  -66.01569 ]], dtype=float32), 46602696)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,551 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-f2999fd0-19c5-427e-af56-cbad7de6b004
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.667233, -145.48022 ],
       [  21.67183 , -145.47212 ],
       [  21.676428, -145.464   ],
       ...,
       [  26.926159,  -66.034355],
       [  26.922552,  -66.025024],
       [  26.918943,  -66.01569 ]], dtype=float32), 46602696)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,559 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-37be6dcb-3639-41a8-98de-5beecc18b304
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.667233, -145.48022 ],
       [  21.67183 , -145.47212 ],
       [  21.676428, -145.464   ],
       ...,
       [  26.926159,  -66.034355],
       [  26.922552,  -66.025024],
       [  26.918943,  -66.01569 ]], dtype=float32), 46602696)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,562 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-50397e63-ca7e-4a16-a768-02cb81525688
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.667233, -145.48022 ],
       [  21.67183 , -145.47212 ],
       [  21.676428, -145.464   ],
       ...,
       [  26.926159,  -66.034355],
       [  26.922552,  -66.025024],
       [  26.918943,  -66.01569 ]], dtype=float32), 46602696)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,726 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-7849f0e0-3d9e-4114-a88b-e10016d9fe9a
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.515612, -145.38202 ],
       [  21.520197, -145.37393 ],
       [  21.524782, -145.36583 ],
       ...,
       [  26.758902,  -66.11466 ],
       [  26.755304,  -66.10535 ],
       [  26.751705,  -66.09604 ]], dtype=float32), 46758976)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,759 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-b2f7420b-9502-4ef7-bed2-1fbe444fdb25
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.364017, -145.2842  ],
       [  21.36859 , -145.27611 ],
       [  21.373161, -145.26802 ],
       ...,
       [  26.591698,  -66.19462 ],
       [  26.588112,  -66.18532 ],
       [  26.584524,  -66.176025]], dtype=float32), 46915256)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,775 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-4eb358d1-6af3-4f4e-a7db-ff0e179f9976
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.515612, -145.38202 ],
       [  21.520197, -145.37393 ],
       [  21.524782, -145.36583 ],
       ...,
       [  26.758902,  -66.11466 ],
       [  26.755304,  -66.10535 ],
       [  26.751705,  -66.09604 ]], dtype=float32), 46758976)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,779 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-a040c1b1-3750-49cf-9ac0-38e60aeac701
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.364017, -145.2842  ],
       [  21.36859 , -145.27611 ],
       [  21.373161, -145.26802 ],
       ...,
       [  26.591698,  -66.19462 ],
       [  26.588112,  -66.18532 ],
       [  26.584524,  -66.176025]], dtype=float32), 46915256)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,796 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-6dc41f8b-6333-42c9-9b3e-8640df6dc468
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.515612, -145.38202 ],
       [  21.520197, -145.37393 ],
       [  21.524782, -145.36583 ],
       ...,
       [  26.758902,  -66.11466 ],
       [  26.755304,  -66.10535 ],
       [  26.751705,  -66.09604 ]], dtype=float32), 46758976)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,797 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-7d687a76-967f-4918-a2b0-b5c08c96ad40
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.364017, -145.2842  ],
       [  21.36859 , -145.27611 ],
       [  21.373161, -145.26802 ],
       ...,
       [  26.591698,  -66.19462 ],
       [  26.588112,  -66.18532 ],
       [  26.584524,  -66.176025]], dtype=float32), 46915256)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,810 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-fde9fbbf-689c-48b7-b968-16161214c2c0
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.364017, -145.2842  ],
       [  21.36859 , -145.27611 ],
       [  21.373161, -145.26802 ],
       ...,
       [  26.591698,  -66.19462 ],
       [  26.588112,  -66.18532 ],
       [  26.584524,  -66.176025]], dtype=float32), 46915256)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,815 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-8868c351-cc52-49d5-9759-6e38a9abe790
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.515612, -145.38202 ],
       [  21.520197, -145.37393 ],
       [  21.524782, -145.36583 ],
       ...,
       [  26.758902,  -66.11466 ],
       [  26.755304,  -66.10535 ],
       [  26.751705,  -66.09604 ]], dtype=float32), 46758976)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,820 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-7b3ee7bf-0b10-425b-b8d5-838fd3d6fd90
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.212452, -145.18675 ],
       [  21.21701 , -145.17868 ],
       [  21.22157 , -145.1706  ],
       ...,
       [  26.424553,  -66.274216],
       [  26.420977,  -66.26494 ],
       [  26.4174  ,  -66.25565 ]], dtype=float32), 47071536)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,831 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-9091412a-a3cb-43b2-9196-437657c83f92
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.212452, -145.18675 ],
       [  21.21701 , -145.17868 ],
       [  21.22157 , -145.1706  ],
       ...,
       [  26.424553,  -66.274216],
       [  26.420977,  -66.26494 ],
       [  26.4174  ,  -66.25565 ]], dtype=float32), 47071536)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,838 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-f5037949-ae67-49cf-916d-a5a767f07366
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.212452, -145.18675 ],
       [  21.21701 , -145.17868 ],
       [  21.22157 , -145.1706  ],
       ...,
       [  26.424553,  -66.274216],
       [  26.420977,  -66.26494 ],
       [  26.4174  ,  -66.25565 ]], dtype=float32), 47071536)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,846 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-8e28d8a3-8b1e-492b-b83e-42d18a761e4d
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.212452, -145.18675 ],
       [  21.21701 , -145.17868 ],
       [  21.22157 , -145.1706  ],
       ...,
       [  26.424553,  -66.274216],
       [  26.420977,  -66.26494 ],
       [  26.4174  ,  -66.25565 ]], dtype=float32), 47071536)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,853 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-1e514544-922a-49d0-af7a-4fb986bd243f
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.060915, -145.08969 ],
       [  21.065462, -145.08162 ],
       [  21.07001 , -145.07355 ],
       ...,
       [  26.257467,  -66.35346 ],
       [  26.2539  ,  -66.3442  ],
       [  26.250336,  -66.33494 ]], dtype=float32), 47227816)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,862 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-9ed7b958-c221-44a0-b111-a86f30c61653
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.060915, -145.08969 ],
       [  21.065462, -145.08162 ],
       [  21.07001 , -145.07355 ],
       ...,
       [  26.257467,  -66.35346 ],
       [  26.2539  ,  -66.3442  ],
       [  26.250336,  -66.33494 ]], dtype=float32), 47227816)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,872 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-7513930b-b149-4126-89bd-5c7018a940ec
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.060915, -145.08969 ],
       [  21.065462, -145.08162 ],
       [  21.07001 , -145.07355 ],
       ...,
       [  26.257467,  -66.35346 ],
       [  26.2539  ,  -66.3442  ],
       [  26.250336,  -66.33494 ]], dtype=float32), 47227816)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,882 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-4d572748-b761-4b30-8a41-93e8f7313b76
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  21.060915, -145.08969 ],
       [  21.065462, -145.08162 ],
       [  21.07001 , -145.07355 ],
       ...,
       [  26.257467,  -66.35346 ],
       [  26.2539  ,  -66.3442  ],
       [  26.250336,  -66.33494 ]], dtype=float32), 47227816)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,965 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-dc437cb9-7469-4610-8327-e28013e42f7b
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.909412, -144.993   ],
       [  20.913946, -144.98494 ],
       [  20.918478, -144.97687 ],
       ...,
       [  26.09044 ,  -66.43235 ],
       [  26.086885,  -66.42311 ],
       [  26.08333 ,  -66.413864]], dtype=float32), 47384096)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,972 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-fd7f7f95-b8e9-41aa-9f62-377c98cdc94d
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.909412, -144.993   ],
       [  20.913946, -144.98494 ],
       [  20.918478, -144.97687 ],
       ...,
       [  26.09044 ,  -66.43235 ],
       [  26.086885,  -66.42311 ],
       [  26.08333 ,  -66.413864]], dtype=float32), 47384096)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,987 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-e06872bd-9a74-4052-95dc-7d1b8b3295b0
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.909412, -144.993   ],
       [  20.913946, -144.98494 ],
       [  20.918478, -144.97687 ],
       ...,
       [  26.09044 ,  -66.43235 ],
       [  26.086885,  -66.42311 ],
       [  26.08333 ,  -66.413864]], dtype=float32), 47384096)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:09,999 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-59d58247-60bc-4db9-9909-e386141c9528
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.75794 , -144.89668 ],
       [  20.76246 , -144.88863 ],
       [  20.766981, -144.88057 ],
       ...,
       [  25.923477,  -66.510895],
       [  25.919933,  -66.50167 ],
       [  25.91639 ,  -66.49244 ]], dtype=float32), 47540376)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,002 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-b565db52-15d3-4170-8894-115163920f60
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.909412, -144.993   ],
       [  20.913946, -144.98494 ],
       [  20.918478, -144.97687 ],
       ...,
       [  26.09044 ,  -66.43235 ],
       [  26.086885,  -66.42311 ],
       [  26.08333 ,  -66.413864]], dtype=float32), 47384096)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,024 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-4b2ee1c8-b595-4cb1-b0a5-ea1010a6af13
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.75794 , -144.89668 ],
       [  20.76246 , -144.88863 ],
       [  20.766981, -144.88057 ],
       ...,
       [  25.923477,  -66.510895],
       [  25.919933,  -66.50167 ],
       [  25.91639 ,  -66.49244 ]], dtype=float32), 47540376)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,024 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-c575de0d-0309-470e-a644-e6473fae7cc4
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.75794 , -144.89668 ],
       [  20.76246 , -144.88863 ],
       [  20.766981, -144.88057 ],
       ...,
       [  25.923477,  -66.510895],
       [  25.919933,  -66.50167 ],
       [  25.91639 ,  -66.49244 ]], dtype=float32), 47540376)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,039 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-235e449e-bff0-4c79-8b4a-53644fec817d
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.606504, -144.80072 ],
       [  20.611012, -144.79268 ],
       [  20.615519, -144.78464 ],
       ...,
       [  25.756577,  -66.589096],
       [  25.753044,  -66.57988 ],
       [  25.74951 ,  -66.57067 ]], dtype=float32), 47696656)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,042 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-ce035930-221f-4fbe-91e5-c5d620866737
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.75794 , -144.89668 ],
       [  20.76246 , -144.88863 ],
       [  20.766981, -144.88057 ],
       ...,
       [  25.923477,  -66.510895],
       [  25.919933,  -66.50167 ],
       [  25.91639 ,  -66.49244 ]], dtype=float32), 47540376)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,051 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-188082e7-a70c-4c56-9401-9f9c3a34ced0
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.606504, -144.80072 ],
       [  20.611012, -144.79268 ],
       [  20.615519, -144.78464 ],
       ...,
       [  25.756577,  -66.589096],
       [  25.753044,  -66.57988 ],
       [  25.74951 ,  -66.57067 ]], dtype=float32), 47696656)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,066 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-0bb7e8c1-180a-4a53-a732-e5db9a883f5e
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.606504, -144.80072 ],
       [  20.611012, -144.79268 ],
       [  20.615519, -144.78464 ],
       ...,
       [  25.756577,  -66.589096],
       [  25.753044,  -66.57988 ],
       [  25.74951 ,  -66.57067 ]], dtype=float32), 47696656)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,079 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-7268dae7-2991-421d-8996-0ac0d8ba4783
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.606504, -144.80072 ],
       [  20.611012, -144.79268 ],
       [  20.615519, -144.78464 ],
       ...,
       [  25.756577,  -66.589096],
       [  25.753044,  -66.57988 ],
       [  25.74951 ,  -66.57067 ]], dtype=float32), 47696656)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,119 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-4c332af6-7261-4a91-978f-3848c85abc8c
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.455103, -144.70514 ],
       [  20.459599, -144.69711 ],
       [  20.464094, -144.68909 ],
       ...,
       [  25.589743,  -66.66695 ],
       [  25.58622 ,  -66.65775 ],
       [  25.582697,  -66.64856 ]], dtype=float32), 47852936)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,129 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-40e12293-44df-4874-b413-417bdebf396e
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.455103, -144.70514 ],
       [  20.459599, -144.69711 ],
       [  20.464094, -144.68909 ],
       ...,
       [  25.589743,  -66.66695 ],
       [  25.58622 ,  -66.65775 ],
       [  25.582697,  -66.64856 ]], dtype=float32), 47852936)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,144 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-ea3f5fac-880e-4dcf-b263-95ae52fe194b
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.455103, -144.70514 ],
       [  20.459599, -144.69711 ],
       [  20.464094, -144.68909 ],
       ...,
       [  25.589743,  -66.66695 ],
       [  25.58622 ,  -66.65775 ],
       [  25.582697,  -66.64856 ]], dtype=float32), 47852936)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,159 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-817aeae2-9ba4-4b8e-85b7-8c690a2edda9
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.455103, -144.70514 ],
       [  20.459599, -144.69711 ],
       [  20.464094, -144.68909 ],
       ...,
       [  25.589743,  -66.66695 ],
       [  25.58622 ,  -66.65775 ],
       [  25.582697,  -66.64856 ]], dtype=float32), 47852936)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,173 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-d17762e4-caf7-4246-8e19-3ff257a1462a
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.303741, -144.60994 ],
       [  20.308224, -144.60191 ],
       [  20.312704, -144.59389 ],
       ...,
       [  25.422976,  -66.74446 ],
       [  25.419464,  -66.735275],
       [  25.41595 ,  -66.7261  ]], dtype=float32), 48009216)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,187 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-a8bef8ca-06e9-4993-8d4c-35b88efb573c
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.303741, -144.60994 ],
       [  20.308224, -144.60191 ],
       [  20.312704, -144.59389 ],
       ...,
       [  25.422976,  -66.74446 ],
       [  25.419464,  -66.735275],
       [  25.41595 ,  -66.7261  ]], dtype=float32), 48009216)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,216 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-a0bee4d3-5ba4-40c2-b7d4-1d8baa99f94d
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.303741, -144.60994 ],
       [  20.308224, -144.60191 ],
       [  20.312704, -144.59389 ],
       ...,
       [  25.422976,  -66.74446 ],
       [  25.419464,  -66.735275],
       [  25.41595 ,  -66.7261  ]], dtype=float32), 48009216)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,218 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-34964f67-9981-4b4a-9a48-ca02b92bd518
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.152418, -144.51509 ],
       [  20.156887, -144.50708 ],
       [  20.161356, -144.49907 ],
       ...,
       [  25.256277,  -66.821625],
       [  25.252775,  -66.81246 ],
       [  25.249273,  -66.8033  ]], dtype=float32), 48165496)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,228 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-7a2706fa-61ff-4585-9326-842d6b93df4b
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.152418, -144.51509 ],
       [  20.156887, -144.50708 ],
       [  20.161356, -144.49907 ],
       ...,
       [  25.256277,  -66.821625],
       [  25.252775,  -66.81246 ],
       [  25.249273,  -66.8033  ]], dtype=float32), 48165496)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,241 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-46f018bd-08fd-476f-9ea9-cf2d5243f338
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.152418, -144.51509 ],
       [  20.156887, -144.50708 ],
       [  20.161356, -144.49907 ],
       ...,
       [  25.256277,  -66.821625],
       [  25.252775,  -66.81246 ],
       [  25.249273,  -66.8033  ]], dtype=float32), 48165496)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,252 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-70a5269b-9f0d-46f9-9908-3b196b5e1909
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.152418, -144.51509 ],
       [  20.156887, -144.50708 ],
       [  20.161356, -144.49907 ],
       ...,
       [  25.256277,  -66.821625],
       [  25.252775,  -66.81246 ],
       [  25.249273,  -66.8033  ]], dtype=float32), 48165496)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,264 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-c752e6d7-e756-4231-bac1-ba6d10f8940f
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.001135, -144.42061 ],
       [  20.005592, -144.4126  ],
       [  20.010048, -144.4046  ],
       ...,
       [  25.08965 ,  -66.89846 ],
       [  25.086159,  -66.88931 ],
       [  25.082666,  -66.880165]], dtype=float32), 48321776)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,270 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-17471b15-1b10-4fa4-8500-d44d1e7d06ca
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.001135, -144.42061 ],
       [  20.005592, -144.4126  ],
       [  20.010048, -144.4046  ],
       ...,
       [  25.08965 ,  -66.89846 ],
       [  25.086159,  -66.88931 ],
       [  25.082666,  -66.880165]], dtype=float32), 48321776)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,284 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-45b1e6bb-f559-4893-9bcf-8a8b65e1499c
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.001135, -144.42061 ],
       [  20.005592, -144.4126  ],
       [  20.010048, -144.4046  ],
       ...,
       [  25.08965 ,  -66.89846 ],
       [  25.086159,  -66.88931 ],
       [  25.082666,  -66.880165]], dtype=float32), 48321776)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,285 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-098fd843-91e9-4872-809c-c167fc41cb41
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.303741, -144.60994 ],
       [  20.308224, -144.60191 ],
       [  20.312704, -144.59389 ],
       ...,
       [  25.422976,  -66.74446 ],
       [  25.419464,  -66.735275],
       [  25.41595 ,  -66.7261  ]], dtype=float32), 48009216)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,303 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-332d935d-4507-420b-afa0-9e43f369698b
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  20.001135, -144.42061 ],
       [  20.005592, -144.4126  ],
       [  20.010048, -144.4046  ],
       ...,
       [  25.08965 ,  -66.89846 ],
       [  25.086159,  -66.88931 ],
       [  25.082666,  -66.880165]], dtype=float32), 48321776)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,368 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-9670631e-5eeb-40c0-a198-4c362145a9a5
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  19.849895, -144.32648 ],
       [  19.85434 , -144.3185  ],
       [  19.858782, -144.3105  ],
       ...,
       [  24.923094,  -66.97495 ],
       [  24.919613,  -66.96583 ],
       [  24.916132,  -66.956696]], dtype=float32), 48478056)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,378 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-12c9a5c4-7f6a-4e24-9707-dd3d251124fc
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  19.849895, -144.32648 ],
       [  19.85434 , -144.3185  ],
       [  19.858782, -144.3105  ],
       ...,
       [  24.923094,  -66.97495 ],
       [  24.919613,  -66.96583 ],
       [  24.916132,  -66.956696]], dtype=float32), 48478056)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,387 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-6108e337-9683-4504-8445-9e19624b107d
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  19.849895, -144.32648 ],
       [  19.85434 , -144.3185  ],
       [  19.858782, -144.3105  ],
       ...,
       [  24.923094,  -66.97495 ],
       [  24.919613,  -66.96583 ],
       [  24.916132,  -66.956696]], dtype=float32), 48478056)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,387 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-77c9d06a-1efb-4b78-a6cb-7f49f5fd7fbd
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  19.698698, -144.23273 ],
       [  19.703129, -144.22475 ],
       [  19.70756 , -144.21675 ],
       ...,
       [  24.723324,  -67.06631 ],
       [  24.719854,  -67.0572  ],
       [  24.716387,  -67.04809 ]], dtype=float32), 48634336)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,397 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-77a40ee5-b725-4499-93a6-3c97042d3071
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  19.698698, -144.23273 ],
       [  19.703129, -144.22475 ],
       [  19.70756 , -144.21675 ],
       ...,
       [  24.723324,  -67.06631 ],
       [  24.719854,  -67.0572  ],
       [  24.716387,  -67.04809 ]], dtype=float32), 48634336)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,408 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-44771b84-fb95-4cad-a569-98f9a06153dd
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  19.698698, -144.23273 ],
       [  19.703129, -144.22475 ],
       [  19.70756 , -144.21675 ],
       ...,
       [  24.723324,  -67.06631 ],
       [  24.719854,  -67.0572  ],
       [  24.716387,  -67.04809 ]], dtype=float32), 48634336)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,416 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-e4ef47de-cb62-473a-b808-b97321061cc5
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  19.698698, -144.23273 ],
       [  19.703129, -144.22475 ],
       [  19.70756 , -144.21675 ],
       ...,
       [  24.723324,  -67.06631 ],
       [  24.719854,  -67.0572  ],
       [  24.716387,  -67.04809 ]], dtype=float32), 48634336)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

2022-11-17 16:51:10,570 - distributed.worker - WARNING - Compute Failed
Key:       XoakIndexWrapper-4d755756-fde2-4985-9a23-d781a936a4c5
Function:  XoakIndexWrapper
args:      ('sklearn_geo_balltree', array([[  19.849895, -144.32648 ],
       [  19.85434 , -144.3185  ],
       [  19.858782, -144.3105  ],
       ...,
       [  24.923094,  -66.97495 ],
       [  24.919613,  -66.96583 ],
       [  24.916132,  -66.956696]], dtype=float32), 48478056)
kwargs:    {}
Exception: "ValueError('Input contains NaN.')"

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File <timed exec>:21

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/xoak/accessor.py:257, in XoakAccessor.sel(self, indexers, **indexers_kwargs)
    253 indices = self._query(indexers)
    255 if not isinstance(indices, np.ndarray):
    256     # TODO: remove (see todo below)
--> 257     indices = indices.compute()
    259 pos_indexers = self._get_pos_indexers(indices, indexers)
    261 # TODO: issue in xarray. 1-dimensional xarray.Variables are always considered
    262 # as OuterIndexer, while we want here VectorizedIndexer
    263 # This would also allow lazy selection

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/dask/base.py:315, in DaskMethodsMixin.compute(self, **kwargs)
    291 def compute(self, **kwargs):
    292     """Compute this dask collection
    293
    294     This turns a lazy Dask collection into its in-memory equivalent.
   (...)
    313     dask.base.compute
    314     """
--> 315     (result,) = compute(self, traverse=False, **kwargs)
    316     return result

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/dask/base.py:600, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
    597     keys.append(x.__dask_keys__())
    598     postcomputes.append(x.__dask_postcompute__())
--> 600 results = schedule(dsk, keys, **kwargs)
    601 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/distributed/client.py:3122, in Client.get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
   3120         should_rejoin = False
   3121 try:
-> 3122     results = self.gather(packed, asynchronous=asynchronous, direct=direct)
   3123 finally:
   3124     for f in futures.values():

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/distributed/client.py:2291, in Client.gather(self, futures, errors, direct, asynchronous)
   2289 else:
   2290     local_worker = None
-> 2291 return self.sync(
   2292     self._gather,
   2293     futures,
   2294     errors=errors,
   2295     direct=direct,
   2296     local_worker=local_worker,
   2297     asynchronous=asynchronous,
   2298 )

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/distributed/utils.py:339, in SyncMethodMixin.sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    337     return future
    338 else:
--> 339     return sync(
    340         self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    341     )

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/distributed/utils.py:406, in sync(loop, func, callback_timeout, *args, **kwargs)
    404 if error:
    405     typ, exc, tb = error
--> 406     raise exc.with_traceback(tb)
    407 else:
    408     return result

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/distributed/utils.py:379, in sync.<locals>.f()
    377         future = asyncio.wait_for(future, callback_timeout)
    378     future = asyncio.ensure_future(future)
--> 379     result = yield future
    380 except Exception:
    381     error = sys.exc_info()

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/tornado/gen.py:762, in Runner.run(self)
    759 exc_info = None
    761 try:
--> 762     value = future.result()
    763 except Exception:
    764     exc_info = sys.exc_info()

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/distributed/client.py:2154, in Client._gather(self, futures, errors, direct, local_worker)
   2152         exc = CancelledError(key)
   2153     else:
-> 2154         raise exception.with_traceback(traceback)
   2155     raise exc
   2156 if errors == "skip":

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/xoak/index/base.py:227, in __init__()
    224 index_adapter_cls = normalize_index(index_adapter)
    226 self._index_adapter = index_adapter_cls(**kwargs)
--> 227 self._index = self._index_adapter.build(points)
    228 self._offset = offset

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/xoak/index/sklearn_adapters.py:55, in build()
     54 def build(self, points):
---> 55     return BallTree(np.deg2rad(points), **self._index_options)

File sklearn/neighbors/_binary_tree.pxi:833, in sklearn.neighbors._ball_tree.BinaryTree.__init__()

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/sklearn/utils/validation.py:899, in check_array()
    893         raise ValueError(
    894             "Found array with dim %d. %s expected <= 2."
    895             % (array.ndim, estimator_name)
    896         )
    898     if force_all_finite:
--> 899         _assert_all_finite(
    900             array,
    901             input_name=input_name,
    902             estimator_name=estimator_name,
    903             allow_nan=force_all_finite == "allow-nan",
    904         )
    906 if ensure_min_samples > 0:
    907     n_samples = _num_samples(array)

File /usr/share/miniconda/envs/catalogs/lib/python3.8/site-packages/sklearn/utils/validation.py:146, in _assert_all_finite()
    124         if (
    125             not allow_nan
    126             and estimator_name
   (...)
    130             # Improve the error message on how to handle missing values in
    131             # scikit-learn.
    132             msg_err += (
    133                 f"\n{estimator_name} does not accept missing values"
    134                 " encoded as NaN natively. For supervised learning, you might want"
   (...)
    144                 "#estimators-that-handle-nan-values"
    145             )
--> 146         raise ValueError(msg_err)
    148 # for object dtype data, we only check for NaNs (GH-13254)
    149 elif X.dtype == np.dtype("object") and not allow_nan:

ValueError: Input contains NaN.

e) SCDNA (extent: North America)

Station-based serially complete datasets (SCDs) of precipitation and temperature observations are important for hydrometeorological studies. Motivated by the lack of serially complete station observations for North America, this study seeks to develop an SCD from 1979 to 2018 from station data. The new SCD for North America (SCDNA) includes daily precipitation, minimum temperature (Tmin), and maximum temperature (Tmax) data for 27 276 stations. Raw meteorological station data were obtained from the Global Historical Climate Network Daily (GHCN-D), the Global Surface Summary of the Day (GSOD), Environment and Climate Change Canada (ECCC), and a compiled station database in Mexico. Stations with at least 8-year-long records were selected, which underwent location correction and were subjected to strict quality control. Outputs from three reanalysis products (ERA5, JRA-55, and MERRA-2) provided auxiliary information to estimate station records

Property

Values

Temporal extent

01/01/1979 – 12/31/2018

Spatial extent

North America : [-177, -52, 7, 83]

Chunks

{‘time’: 1000, ‘ID’: 1000}

[14]:
ds=cat.atmosphere.scdna.to_dask()
ds
[14]:
<xarray.Dataset>
Dimensions:    (ID: 27276, time: 14610)
Coordinates:
  * ID         (ID) <U13 'GS91066022701' 'GHMQW00022701' ... 'ECCA008402568'
    elevation  (ID) float32 dask.array<chunksize=(1000,), meta=np.ndarray>
    latitude   (ID) float32 dask.array<chunksize=(1000,), meta=np.ndarray>
    longitude  (ID) float32 dask.array<chunksize=(1000,), meta=np.ndarray>
  * time       (time) datetime64[ns] 1979-01-01 1979-01-02 ... 2018-12-31
Data variables:
    prcp       (ID, time) float32 dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    prcp_flag  (ID, time) float64 dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    prcp_kge   (ID) float32 dask.array<chunksize=(1000,), meta=np.ndarray>
    sflag      (ID) <U3 dask.array<chunksize=(1000,), meta=np.ndarray>
    tmax       (ID, time) float32 dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    tmax_flag  (ID, time) float64 dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    tmax_kge   (ID) float32 dask.array<chunksize=(1000,), meta=np.ndarray>
    tmin       (ID, time) float32 dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    tmin_flag  (ID, time) float64 dask.array<chunksize=(1000, 1000), meta=np.ndarray>
    tmin_kge   (ID) float32 dask.array<chunksize=(1000,), meta=np.ndarray>
[15]:
%%time
ds.prcp \
.sel(time=slice('1996-07-19','1996-07-20')) \
.sum('time') \
.to_dataframe() \
.replace({0:np.nan}) \
.dropna(how='any') \
.hvplot.points(x='longitude',
               y='latitude',
               color='prcp',
               geo=True,
               alpha=0.5,
               xlim=(-180,-30),
               ylim=(0,72),
               tiles='ESRI',
               cmap='gist_ncar',
               clim=(0,100),
               hover_cols=['ID','prcp'],
               width=700,
               height=400,
               title=f'48h precipitation during Saguenay flood event')
CPU times: user 1.4 s, sys: 141 ms, total: 1.54 s
Wall time: 21.2 s
[15]:

f) 20 Century reanalysis - single levels (extent : Atlantic Northeast)

Using a state-of-the-art data assimilation system and surface pressure observations, the NOAA-CIRES-DOE Twentieth Century Reanalysis (20CR) project has generated a four-dimensional global atmospheric dataset of weather spanning 1836 to 2015 to place current atmospheric circulation patterns into a historical perspective.

Property

Values

Temporal extent:

01/01/1836 – 12/31/2015

Spatial extent:

Atlantic Northeast [-96, -52, 40, 63]

Chunks

{‘time’: 32872, ‘longitude’: 6, ‘latitude’: 3}

Spatial resolution:

1 degrees

Spatial reference:

WGS84 (EPSG:4326)

Temporal resolution:

3 hours

Update frequency:

None

Data access

[16]:
ds=cat.atmosphere['20_century_reanalysis_single_levels'].to_dask()
ds
[16]:
<xarray.Dataset>
Dimensions:    (time: 525952, latitude: 24, longitude: 45)
Coordinates:
  * latitude   (latitude) float32 40.0 41.0 42.0 43.0 ... 60.0 61.0 62.0 63.0
  * longitude  (longitude) float32 -96.0 -95.0 -94.0 -93.0 ... -54.0 -53.0 -52.0
  * time       (time) datetime64[ns] 1836-01-01 ... 2015-12-31T21:00:00
Data variables:
    apcp       (time, latitude, longitude) float32 dask.array<chunksize=(32872, 3, 6), meta=np.ndarray>
    cape       (time, latitude, longitude) float32 dask.array<chunksize=(32872, 3, 6), meta=np.ndarray>
    crain      (time, latitude, longitude) float32 dask.array<chunksize=(32872, 3, 6), meta=np.ndarray>
    pr_wtr     (time, latitude, longitude) float32 dask.array<chunksize=(32872, 3, 6), meta=np.ndarray>
    prate      (time, latitude, longitude) float32 dask.array<chunksize=(32872, 3, 6), meta=np.ndarray>
    tcdc       (time, latitude, longitude) float32 dask.array<chunksize=(32872, 3, 6), meta=np.ndarray>
    tmax       (time, latitude, longitude) float32 dask.array<chunksize=(32872, 3, 6), meta=np.ndarray>
    tmin       (time, latitude, longitude) float32 dask.array<chunksize=(32872, 3, 6), meta=np.ndarray>
Attributes: (12/24)
    Conventions:               CF-1.2
    References:                https://www.psl.noaa.gov/data/gridded/data.20t...
    assimilation_algorithm:    Ensemble Kalman Filter with 4DIAU
    citation:                  Compo,G.P. <https://www.psl.noaa.gov/people/gi...
    citation1:                 Slivinski, L. C, G. P. Compo, J. S. Whitaker, ...
    comments:                  Data are from \nNOAA/CIRES/DOE 20th Century Re...
    ...                        ...
    product:                   reanalysis
    source:                    20CRv3si 2018, Ensemble Kalman Filter, ocean (...
    spatial_resolution:        1.0 degree
    standard_name_vocabulary:  NetCDF Climate and Forecast (CF) Metadata Conv...
    title:                     8x Daily NOAA/CIRES/DOE 20th Century Reanalysi...
    version:                   3si

Working with the data

Here we compute a simple line plot :

[17]:
%%time
ds.sel(latitude=45,
       longitude=-75) \
.prate \
.hvplot(grid=True)
CPU times: user 291 ms, sys: 22.1 ms, total: 313 ms
Wall time: 3.88 s
[17]:

g) 20 Century reanalysis - single levels (large area : for analysis in space)

Using a state-of-the-art data assimilation system and surface pressure observations, the NOAA-CIRES-DOE Twentieth Century Reanalysis (20CR) project has generated a four-dimensional global atmospheric dataset of weather spanning 1836 to 2015 to place current atmospheric circulation patterns into a historical perspective.

Property

Values

Temporal extent:

01/01/1836 – 12/31/2015

Spatial extent:

Atlantic Northeast [-96, -52, 40, 63]

Chunks

{‘time’: 100, ‘longitude’: 45, ‘latitude’: 24}

Spatial resolution:

1 degrees

Spatial reference:

WGS84 (EPSG:4326)

Temporal resolution:

3 hours

Update frequency:

None

[18]:
ds=cat.atmosphere['20_century_reanalysis_single_levels_large_area'].to_dask()
ds
[18]:
<xarray.Dataset>
Dimensions:    (time: 525952, latitude: 24, longitude: 45)
Coordinates:
  * latitude   (latitude) float32 40.0 41.0 42.0 43.0 ... 60.0 61.0 62.0 63.0
  * longitude  (longitude) float32 -96.0 -95.0 -94.0 -93.0 ... -54.0 -53.0 -52.0
  * time       (time) datetime64[ns] 1836-01-01 ... 2015-12-31T21:00:00
Data variables:
    apcp       (time, latitude, longitude) float32 dask.array<chunksize=(100, 24, 45), meta=np.ndarray>
    cape       (time, latitude, longitude) float32 dask.array<chunksize=(100, 24, 45), meta=np.ndarray>
    crain      (time, latitude, longitude) float32 dask.array<chunksize=(100, 24, 45), meta=np.ndarray>
    pr_wtr     (time, latitude, longitude) float32 dask.array<chunksize=(100, 24, 45), meta=np.ndarray>
    prate      (time, latitude, longitude) float32 dask.array<chunksize=(100, 24, 45), meta=np.ndarray>
    tcdc       (time, latitude, longitude) float32 dask.array<chunksize=(100, 24, 45), meta=np.ndarray>
    tmax       (time, latitude, longitude) float32 dask.array<chunksize=(100, 24, 45), meta=np.ndarray>
    tmin       (time, latitude, longitude) float32 dask.array<chunksize=(100, 24, 45), meta=np.ndarray>
Attributes: (12/24)
    Conventions:               CF-1.2
    References:                https://www.psl.noaa.gov/data/gridded/data.20t...
    assimilation_algorithm:    Ensemble Kalman Filter with 4DIAU
    citation:                  Compo,G.P. <https://www.psl.noaa.gov/people/gi...
    citation1:                 Slivinski, L. C, G. P. Compo, J. S. Whitaker, ...
    comments:                  Data are from \nNOAA/CIRES/DOE 20th Century Re...
    ...                        ...
    product:                   reanalysis
    source:                    20CRv3si 2018, Ensemble Kalman Filter, ocean (...
    spatial_resolution:        1.0 degree
    standard_name_vocabulary:  NetCDF Climate and Forecast (CF) Metadata Conv...
    title:                     8x Daily NOAA/CIRES/DOE 20th Century Reanalysi...
    version:                   3si

Working with the data

[19]:
%%time
ds.sel(time='2000-01-01T00:00') \
.tmax \
.hvplot(grid=True,
        cmap='cwr',
        geo=True,
        tiles='CartoLight',
        alpha=0.75,
        width=700,
        height=400,)
CPU times: user 138 ms, sys: 0 ns, total: 138 ms
Wall time: 136 ms
[19]:

Other datasets :

The previous examples can be applied to the following datasets as well. We will let the end user experiment with them!

[20]:
ds=cat.atmosphere['20_century_reanalysis_pressure_levels'].to_dask()
ds
[20]:
<xarray.Dataset>
Dimensions:    (time: 525952, level: 17, latitude: 24, longitude: 45)
Coordinates:
  * latitude   (latitude) float32 40.0 41.0 42.0 43.0 ... 60.0 61.0 62.0 63.0
  * level      (level) float64 1.0 5.0 10.0 20.0 ... 700.0 800.0 900.0 1e+03
  * longitude  (longitude) float32 -96.0 -95.0 -94.0 -93.0 ... -54.0 -53.0 -52.0
  * time       (time) datetime64[ns] 1836-01-01 ... 2015-12-31T21:00:00
Data variables:
    air        (time, level, latitude, longitude) float32 dask.array<chunksize=(29200, 1, 24, 25), meta=np.ndarray>
    hgt        (time, level, latitude, longitude) float32 dask.array<chunksize=(29200, 1, 24, 25), meta=np.ndarray>
    omega      (time, level, latitude, longitude) float32 dask.array<chunksize=(29200, 1, 24, 25), meta=np.ndarray>
    rhum       (time, level, latitude, longitude) float32 dask.array<chunksize=(29200, 1, 24, 25), meta=np.ndarray>
Attributes: (12/25)
    Conventions:                     CF-1.2
    DODS_EXTRA.Unlimited_Dimension:  time
    References:                      https://www.esrl.noaa.gov/psd/data/gridd...
    assimilation_algorithm:          Ensemble Kalman Filter with 4DIAU
    citation:                        Compo,G.P. <https://www.esrl.noaa.gov/ps...
    citation1:                       Slivinski, L. C, G. P. Compo, J. S. Whit...
    ...                              ...
    product:                         reanalysis
    source:                          20CRv3si 2018, Ensemble Kalman Filter, o...
    spatial_resolution:              1.0 degree
    standard_name_vocabulary:        NetCDF Climate and Forecast (CF) Metadat...
    title:                           8x Daily NOAA/CIRES/DOE 20th Century Rea...
    version:                         3si
[21]:
ds=cat.atmosphere['20_century_reanalysis_pressure_levels_large_area'].to_dask()
ds
[21]:
<xarray.Dataset>
Dimensions:    (time: 525952, level: 17, latitude: 24, longitude: 45)
Coordinates:
  * latitude   (latitude) float32 40.0 41.0 42.0 43.0 ... 60.0 61.0 62.0 63.0
  * level      (level) float64 1.0 5.0 10.0 20.0 ... 700.0 800.0 900.0 1e+03
  * longitude  (longitude) float32 -96.0 -95.0 -94.0 -93.0 ... -54.0 -53.0 -52.0
  * time       (time) datetime64[ns] 1836-01-01 ... 2015-12-31T21:00:00
Data variables:
    air        (time, level, latitude, longitude) float32 dask.array<chunksize=(100, 1, 24, 45), meta=np.ndarray>
    hgt        (time, level, latitude, longitude) float32 dask.array<chunksize=(100, 1, 24, 45), meta=np.ndarray>
    omega      (time, level, latitude, longitude) float32 dask.array<chunksize=(100, 1, 24, 45), meta=np.ndarray>
    rhum       (time, level, latitude, longitude) float32 dask.array<chunksize=(100, 1, 24, 45), meta=np.ndarray>
Attributes: (12/25)
    Conventions:                     CF-1.2
    DODS_EXTRA.Unlimited_Dimension:  time
    References:                      https://www.esrl.noaa.gov/psd/data/gridd...
    assimilation_algorithm:          Ensemble Kalman Filter with 4DIAU
    citation:                        Compo,G.P. <https://www.esrl.noaa.gov/ps...
    citation1:                       Slivinski, L. C, G. P. Compo, J. S. Whit...
    ...                              ...
    product:                         reanalysis
    source:                          20CRv3si 2018, Ensemble Kalman Filter, o...
    spatial_resolution:              1.0 degree
    standard_name_vocabulary:        NetCDF Climate and Forecast (CF) Metadat...
    title:                           8x Daily NOAA/CIRES/DOE 20th Century Rea...
    version:                         3si
[22]:
ds=cat.atmosphere['terraclimate'].to_dask()
ds
[22]:
<xarray.Dataset>
Dimensions:                 (time: 744, lat: 4320, lon: 8640, crs: 1)
Coordinates:
  * crs                     (crs) int16 3
  * lat                     (lat) float64 89.98 89.94 89.9 ... -89.94 -89.98
  * lon                     (lon) float64 -180.0 -179.9 -179.9 ... 179.9 180.0
  * time                    (time) datetime64[ns] 1958-01-01 ... 2019-12-01
Data variables: (12/18)
    aet                     (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    def                     (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    pdsi                    (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    pet                     (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    ppt                     (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    ppt_station_influence   (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    ...                      ...
    tmin                    (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    tmin_station_influence  (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    vap                     (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    vap_station_influence   (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    vpd                     (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
    ws                      (time, lat, lon) float32 dask.array<chunksize=(12, 1440, 1440), meta=np.ndarray>
[23]:
# A new dataset with all rcps is being created and will replace this one
ds=cat.climate_change['rcp45_day_NAM_22i_raw_zarr'].to_dask()
ds

[23]:
<xarray.Dataset>
Dimensions:    (lat: 258, lon: 600, member_id: 3, time: 34698, bnds: 2)
Coordinates:
  * lat        (lat) float64 12.12 12.38 12.62 12.88 ... 75.62 75.88 76.12 76.38
  * lon        (lon) float64 -171.9 -171.6 -171.4 ... -22.62 -22.38 -22.12
  * member_id  (member_id) <U20 'CanESM2.CRCM5-OUR' ... 'GFDL-ESM2M.CRCM5-OUR'
  * time       (time) datetime64[ns] 2006-01-01T12:00:00 ... 2100-12-31T12:00:00
    time_bnds  (time, bnds) datetime64[ns] dask.array<chunksize=(17349, 2), meta=np.ndarray>
Dimensions without coordinates: bnds
Data variables: (12/15)
    hurs       (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    huss       (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    pr         (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    prec       (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    ps         (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    rsds       (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    ...         ...
    tasmin     (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    temp       (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    tmax       (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    tmin       (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    uas        (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
    vas        (member_id, time, lat, lon) float32 dask.array<chunksize=(3, 1000, 65, 120), meta=np.ndarray>
Attributes: (12/23)
    CORDEX_domain:                  NAM-22
    contact:                        {"GFDL-ESM2M.CRCM5-OUR": "biner.sebastien...
    creation_date:                  {"GFDL-ESM2M.CRCM5-OUR": "2019-02-12 15:2...
    driving_experiment:             {"GFDL-ESM2M.CRCM5-OUR": "GFDL-ESM2M,rcp4...
    driving_experiment_name:        rcp45
    driving_model_ensemble_member:  {"GFDL-ESM2M.CRCM5-OUR": "r1i1p1", "CanES...
    ...                             ...
    references:                     {"GFDL-ESM2M.CRCM5-OUR": "http://www.oura...
    title:                          {"GFDL-ESM2M.CRCM5-OUR": "NA-CORDEX Raw N...
    tracking_id:                    {"GFDL-ESM2M.CRCM5-OUR": "5139ec82-c55f-4...
    version:                        {"GFDL-ESM2M.CRCM5-OUR": "1.1", "CanESM2....
    zarr-dataset-reference:         For dataset documentation, see DOI https:...
    zarr-version:                   1.0
[24]:
# Sample from melcc hydrometric data. Needs to be completed and add data from other providers as well.
ds=cat.hydrology['melcc'].to_dask()
ds

[24]:
<xarray.Dataset>
Dimensions:                 (basin_id: 470, time: 41007)
Coordinates: (12/16)
    _last_update_timestamp  (basin_id) datetime64[ns] dask.array<chunksize=(470,), meta=np.ndarray>
    aggregation             (basin_id) <U1 dask.array<chunksize=(470,), meta=np.ndarray>
  * basin_id                (basin_id) <U6 '010101' '010801' ... '135201'
    data_type               (basin_id) <U1 dask.array<chunksize=(470,), meta=np.ndarray>
    drainage_area           (basin_id) float32 dask.array<chunksize=(470,), meta=np.ndarray>
    end_date                (basin_id) datetime64[ns] dask.array<chunksize=(470,), meta=np.ndarray>
    ...                      ...
    regulated               (basin_id) <U1 dask.array<chunksize=(470,), meta=np.ndarray>
    source                  (basin_id) <U1 dask.array<chunksize=(470,), meta=np.ndarray>
    start_date              (basin_id) datetime64[ns] dask.array<chunksize=(470,), meta=np.ndarray>
  * time                    (time) datetime64[ns] 1910-01-01 ... 2022-04-09
    timestep                (basin_id) <U1 dask.array<chunksize=(470,), meta=np.ndarray>
    units                   (basin_id) <U1 dask.array<chunksize=(470,), meta=np.ndarray>
Data variables:
    flag                    (time, basin_id) <U1 dask.array<chunksize=(2563, 59), meta=np.ndarray>
    value                   (time, basin_id) float32 dask.array<chunksize=(5126, 59), meta=np.ndarray>